Linguistic tuple segmentation in ngram-ba

نویسنده

  • Adrià de Gispert
چکیده

Ngram-based Statistical Machine Translation relies on a standard Ngram language model of tuples to estimate the translation process. In training, this translation model requires a segmentation of each parallel sentence, which involves taking a hard decision on tuple segmentation when a word is not linked during word alignment. This is especially critical when this word appears in the target language, as this hard decision is compulsory. In this paper we present a thorough study of this situation, comparing for the first time each of the proposed techniques in two independent tasks, namely English–Spanish European Parliament Proceedings large-vocabulary task and Arabic–English Basic Travel Expressions small-data task. In the face of this comparison, we present a novel segmentation technique which incorporates linguistic information. Results obtained in both tasks outperform all previous techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic tuple segmentation in n-gram-based statistical machine translation

Ngram-based Statistical Machine Translation relies on a standard Ngram language model of tuples to estimate the translation process. In training, this translation model requires a segmentation of each parallel sentence, which involves taking a hard decision on tuple segmentation when a word is not linked during word alignment. This is especially critical when this word appears in the target lan...

متن کامل

Segmentación lingística de tuplas para el modelado de la traducción estocástica mediante n-gramas

Ngram-based Statistical Machine Translation relies on a standard Ngram language model of tuples to estimate the translation process. In training, this translation model requires a segmentation of each parallel sentence, which involves taking a hard decision on tuple segmentation when a word is not linked during word alignment. This is especially critical when this word appears in the target lan...

متن کامل

2-tuple intuitionistic fuzzy linguistic aggregation operators in multiple attribute decision making

In this paper, we investigate the multiple attribute decisionmaking (MADM) problems with 2-tuple intuitionistic fuzzylinguistic information. Then, we utilize arithmetic and geometricoperations to develop some 2-tuple intuitionistic fuzzy linguisticaggregation operators. The prominent characteristic of theseproposed operators are studied. Then, we have utilized theseoperators to develop some app...

متن کامل

A Hybrid Multi-attribute Group Decision Making Method Based on Grey Linguistic 2-tuple

Because of the complexity of decision-making environment, the uncertainty of fuzziness and the uncertainty of grey maybe coexist in the problems of multi-attribute group decision making. In this paper, we study the problems of multi-attribute group decision making with hybrid grey attribute data (the precise values, interval numbers and linguistic fuzzy variables coexist, and each attribute val...

متن کامل

Reordered Search and Tuple Unfolding for Ngram-based SMT

In Statistical Machine Translation, the use of reordering for certain language pairs can produce a significant improvement on translation accuracy. However, the search problem is shown to be NP-hard when arbitrary reorderings are allowed. This paper addresses the question of reordering for an Ngram-based SMT approach following two complementary strategies, namely reordered search and tuple unfo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006